Parallel Classification on SMP Systems

نویسندگان

  • Mohammed J. Zaki
  • Rakesh Agrawal
چکیده

This paper presents fast scalable decision-tree-based classification algorithms targeting shared-memory systems. The algorithms are based on the sequential SPRINT classifier and span the gamut of data and task parallelism. The data parallelism is based on attribute scheduling among processors. This is extended with task pipelining and dynamic load balancing to yield more efficient schemes. The task parallel approach uses dynamic subtree partitioning among processors. These schemes are disk based and achieve excellent speedup, making them ideally suited for data mining in very large databases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel Classification for Data Mining on Shared-Memory Multiprocessors

We present parallel algorithms for building decision-tree classifiers on shared-memory multiprocessor (SMP) systems. The proposed algorithms span the gamut of data and task parallelism. The data parallelism is based on attribute scheduling among processors. This basic scheme is extended with task pipelining and dynamic load balancing to yield faster implementations. The task parallel approach u...

متن کامل

A New Prediction Oriented Barrier Synchronization on SMP Clusters

Clusters of Symmetric Multiprocessors (CSMP) are becoming an increasingly popular high-performance computing platform due to the commodity availability of multiprocessor nodes, mature SMP operating systems, low-latency, highbandwidth data networks, and superior price-performance ratio. Fast synchronization is crucial to making efficient use of SMP clusters. In this paper, we focus on one kind o...

متن کامل

Scalable Data Mining for Rules

Data Mining is the process of automatic extraction of novel, useful, and understandable patterns in very large databases. High-performance scalable and parallel computing is crucial for ensuring system scalability and interactivity as datasets grow inexorably in size and complexity. This thesis deals with both the algorithmic and systems aspects of scalable and parallel data mining algorithms a...

متن کامل

A Taxonomy of Programming Models for Symmetric Multiprocessors and SMP Clusters

The basic processing element, from PCs to large systems, is rapidly becoming a symmetric multiprocessor (SMP). As a result, the nodes of a parallel computer will often be an SMP. The resulting mixed hardware models (combining shared-memory and distributed memory) provide a challenge to system software developers to provide users with programming models that are portable, understandable, and eff...

متن کامل

An SMP soft classification algorithm for remote sensing

This work introduces a symmetric multiprocessing (SMP) version of the continuous iterative guided spectral class rejection (CIGSCR) algorithm, a semiautomated classification algorithm for remote sensing (multispectral) images. The algorithm uses soft data clusters to produce a soft classification containing inherently more information than a comparable hard classification at an increased comput...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998